-
Notifications
You must be signed in to change notification settings - Fork 169
cam6_4_131: Performance improvements for CSLAM (from John Dennis, CISL, and Lauritzen) #1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cam6_4_131: Performance improvements for CSLAM (from John Dennis, CISL, and Lauritzen) #1365
Conversation
|
I am not getting B4B in this test: ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s (low top FHISTC with CSLAM) @johnmauff: Could these changes be round-off (order of operation changes?) |
| if (present(fotherpanel)) then | ||
| fotherpanel (1-nht:nc+nht,1-nht:0 ,1)=fcube(1-nht:nc+nht,1-nht:0 ) | ||
| do halo=1,nhr | ||
| ftmp(:) = fcube(:,halo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the copy to ftmp is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| ! fill in "n" on Figure above | ||
| ! | ||
| do halo=1,nhr | ||
| ftmp(:) = fcube(:,halo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ftmp(:) is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
Peter,
I would guess that they are round-off changes. I have not check with a
low top model. I only tested this with a high-top model, looking at the
nstep output.
John
…On Mon, Aug 18, 2025 at 8:42 AM Peter Hjort Lauritzen < ***@***.***> wrote:
*PeterHjortLauritzen* left a comment (ESCOMP/CAM#1365)
<#1365 (comment)>
I am not getting B4B in this test:
ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s
(low top FHISTC with CSLAM)
@johnmauff <https://github.com/johnmauff>: Could these changes be
round-off (order of operation changes?)
—
Reply to this email directly, view it on GitHub
<#1365 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADH7NUU3WXMFA2QRNJ45PRL3OHQ4XAVCNFSM6AAAAACEFIYUR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCOJXGIZDINJRGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
| if(ns==3) then | ||
| dotproduct = DotProduct_3(w,f) | ||
| else | ||
| dotproduct = DotProduct_gen(w,f,ns) | ||
| endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmauff I guess the NBFB answer comes from this function? When ns = 3, the dot product is hard coded for performance optimization, but the truncation error will be different from the general version since they are now computed in a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the general algorithm but that did not change BFB differences with the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterHjortLauritzen I was able to produce BFB result with the baseline after doing the following things:
- Fixed the typo (confirm by @johnmauff ) at line 1797 (https://github.com/ESCOMP/CAM/pull/1365/files#diff-8beef36006cafdecbf26406cf4357fc0797eb6c7316d9c6aa0a2486fade88f25R1797).
- Merged the latest
cam_developmentbranch into thedennis_perf_cslam1branch (currently thedennis_perf_cslam1branch is based on the tag cam6_4_089).
I ran the baroclinic wave test case (FKESSLER) and compared the optimized version against the baseline/trunk. Below is PS at day 10:
For comparison, here is a pertlim test (in this case perturbing PS by 1E-14):
The pertlim test produces errors about 100× smaller. It’s unclear whether it matters that the optimized code introduces round-off errors at every time step, whereas the pertlim test only introduces them at initialization. I'll keep looking/thinking ... UPDATE: all tests were due to code bug ... all tests (I am running) are now BFB |
|
@PeterHjortLauritzen I am curious: I thought threading was not supported for the SE dycore (#941). Thus why the ERP test still works here? |
|
@PeterHjortLauritzen I ran the |
| ! | ||
| w = halo_interp_weight(:,:,:,2) | ||
| do halo=1,nhr | ||
| ! ftmp(:) = fcube(nc+1-halo,:) ! copy to a temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterHjortLauritzen @johnmauff I think this line should not be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sjsprecious ! Now the results make sense ... see plots above (in a couple of minutes)
Don't know; I just took the test from the CAM test list without thinking much about the actual test (correct! Threading is currently broken in the SE dycore). @nusbaume Do you know the answer to @sjsprecious 's question? |
|
All the tests I am running are BFB now! Thanks @johnmauff and @sjsprecious ... @cacraigucar: This PR is ready to go ... |
|
Hi @PeterHjortLauritzen @sjsprecious, an ERP test takes the default thread and task count from the first case and divides it by 2 for the second case if the number is greater than one. You can see that logic in the CIME code here: https://github.com/ESMCI/cime/blob/master/CIME/SystemTests/erp.py#L32 Given that the default configuration for an SE dycore run is one thread but multiple MPI tasks, it ends up adjusting the tasks but not the threads (which is what allows the test to run with this CAM configuration in the first place). Also, my understanding is that the difference between ERS and ERP is that an ERS test won't change the task layout at all and simply checks that the restart run is bit-for-bit, while the ERP test will halve the processor count for the restarted run before checking if the results are bit-for-bit. Anyways, I hope that helps, and thanks again for getting these improvements into CAM! |
Thanks @nusbaume for your clarification. Clearly I misunderstood the ERS and ERP tests before, but now it is clear to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterHjortLauritzen could you look at line 487 of this file? The denominator is in a form that is inconsistent with other polynomial evaluations. Is this missing a parens? I.e. should it be invtmp = 1.0_r8 / (recons(6,i,j) + spherecentroid(1,i,j))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmauff thanks for noting this!
I re-did the math for computing the min/max values and I think you are right. The if (abs(recons(6,i,j)) > threshold) code block is buggy and actually redundant.
Details, if interested, below:
We wish to find the min/max values of a polynomial on the form
First we find the interior critical points:
Compute the gradient and set it to zero
If
is non-zero
then the extrema are
These formulas are correct in the code.
Evaluate on the Boundaries
The boundary consists of four line segments. On each,
All these formulas are correct in the code.
However, the code block with if (abs(recons(6,i,j)) > threshold) is buggy (as you notice) and I think redundant as the cases are already covered with the search above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run some FKESSLER cases to verify what happens without the buggy if (abs(recons(6,i,j)) > threshold) section:
- the non-linear correlation between cl1 and cl2 is still maintained: cly=constant!
- the min(upper plot)/max(lower plot) values for the slotted cylinders are pretty much the same.
(initial value: 0.100000000000000E+00; after 10 days: 0.999999999993388E-01 )
(initial value: 0.333333333333333E+00 0.333333333210990E+00 )
So no new over or undershoots introduced hence the " if (abs(recons(6,i,j)) > threshold)" code block can be removed!
The solutions are bit4bit until time-step 114 ()
< nstep, te 114 0.26393936945915651E+10 0.26393936951915169E+10 0.32751870252974244E-07 0.99999998855001060E+05 0.20548077300190920E+03
nstep, te 114 0.26393936945915661E+10 0.26393936951915169E+10 0.32751818191091291E-07 0.99999998855001075E+05 0.20548077300190920E+03
i.e. bit4bit until day 2.375
6067381 to
9e7393b
Compare
|
Peter,
Thanks for reverting the commit. It looks like Rory's changes are only a
couple of lines.
John
…On Mon, Sep 29, 2025 at 7:11 AM Peter Hjort Lauritzen < ***@***.***> wrote:
*PeterHjortLauritzen* left a comment (ESCOMP/CAM#1365)
<#1365 (comment)>
@pel <https://github.com/pel> I just noticed that commit 6067381
<6067381>
appears to deleate all of my optimizations. Is there a reason for this?
apologies ... mistake ... reverted!
—
Reply to this email directly, view it on GitHub
<#1365 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADH7NUWB6PZUQRFKWHLX2LD3VEVYJAVCNFSM6AAAAACEFIYUR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBWHA2TMMZZGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I heard at a meeting the optimization work may be completed, with no deliverable for CAM (other than something which Peter will include in a future PR). Is this correct, and should this PR be closed? |
|
never mind - @PeterHjortLauritzen is working on this |
|
FYI: performance results on this Wiki https://github.com/PeterHjortLauritzen/CAM/wiki/Performance-notes-(2025) |
nusbaume
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all! I found a few lines of commented-out code that could potentially be removed, but otherwise everything looks good to me.
| !not b4b ex1 = (r6*r3 - 2.0_r8*r5*r2) / disc + scx | ||
| !not b4b ex2 = (r6*r2 - 2.0_r8*r4*r3) / disc + scy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code?
| ! Top/bottom edge, y=const., du/dx=0 | ||
| ! | ||
| if (abs(r4) > threshold) then | ||
| invtmp = 1.0_r8 / (2.0_r8 * r4)! + spherecentroid(1,i,j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code here?
| invtmp = 1.0_r8 / (2.0_r8 * r4)! + spherecentroid(1,i,j) | |
| invtmp = 1.0_r8 / (2.0_r8 * r4) |
| ! Top/bottom edge, y=const., du/dx=0 | ||
| ! | ||
| if (abs(r5) > threshold) then | ||
| invtmp = 1.0_r8 / (2.0_r8 * r5)! + spherecentroid(1,i,j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code here?
| invtmp = 1.0_r8 / (2.0_r8 * r5)! + spherecentroid(1,i,j) | |
| invtmp = 1.0_r8 / (2.0_r8 * r5) |
|
Thanks @nusbaume for the review. I have redone the math for finding extrema in CSLAM and found buggy code. I fixed it. Changes are not B4B but it takes over 2 days for differences to show up in FKESSLER. I did a science validation with FKESSLER (looking at traer properties) and things look good! |


Closes #1360